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■ Speech input is converted to digital data (3) and undergoes spectral analysis (4). The spectrum is analyzed to identify phones (S) 
using a standard neural network implemented using stored weights (8). The phones are further combined to identify phonemes (6). The 
phonemes are then translated into a different language (7) based on a stored language dictionary (9) and converted to text output 
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DESCRIPTION 

Multi -Language Speech Recognition System 

Field of Invention 

The present invention relates to speech recognition 
systems and methods. 

Background 

5 The prior art includes many systems and methods for 

transcribing speech. One of the major differences between 
them is the level of difficulty of the speech recognition 
task they are intended to perform. The simplest such task 
is the recognition of a small number of acoustically 

10 distinct words spoken in isolation (often called discrete 
speech). U.S. Pat. No. 4,910,784 to Doddington et al 
("Low Cost Speech Recognition System and Method") is an 
example of the prior art of this class of system. Such 
systems are useful, for example, for giving a small set of 

15 commands to operate a computer, but can not handle 
continuous speech. A more difficult type task is the 
identification of one or more designated words occurring 
in a continuous stream of words or "word spotting". U.S. 
Pat. No. 4,937,870 to Bossemeyer, Jr. is an example of the 

20 prior art of this class of system. Such systems might be 
used, for example, in a telephone application for 
identifying key words or phrases within an utterance such 
as "credit card", "collect", "third party", etc. but can 
not transcribe continuous speech. A still more difficult 

25 type task is the recognition of all words in a complete 
sentence where the words are spoken in isolation and the 
grammatical structure of the sentence is prescribed. U.S. 
Pat. No. 4,882,757 to Fisher et al ("Speech Recognition 
System") is an example of the prior art. of this class of 

30 system. Such systems can be useful in applications where 
the speaker is willing to accept speaking in an un-natural 
manner to accommodate the needs of the system. An even 
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more difficult speech recognition task is the recognition 
of all words in a complete sentence when the words are 
connected (generally referred to as continuous speech) , 
the grammatical structure of the sentence is prescribed 
5 and the lexicon is constrained. U.S. Pat. No. 5,040,127 
to Gerson ("Continuous Speech Recognition System") is an 
example of the prior art of this class of system. Such 
systems can be useful in task-specific applications where 
the user is aware of the system vocabulary and grammar 

10 constraints and able to modify his or her speech pattern 
accordingly. The most difficult type of task is the 
recognition of all words in a continuous, spontaneous 
utterance that may have no structure and indeed may be 
ungrammatical in form. U.S. Pat. No. 4,852,170 to 

15 Bordeaux ("Real Time Computer Speech Recognition System ") 
is an example of the prior art of this class of system. 

Systems and methods of speech recognition may also be 
classified according to whether they are speaker- 
dependent; i.e., must be trained by a particular speaker 

20 prior to that speaker making use of the machine, or 
whether they are speaker- independent ; i.e., a particular 
speaker need not train the machine prior to using it. A 
variation of the speaker-dependent type is the speaker- 
adaptive system which aims to make the training of the 

25 machine easier and faster. Speaker- independent systems 
are more difficult to achieve than are speaker- dependent 
ones; however, in most applications they have much greater 
utility. The present system described herein is speaker- 
independent . 

30 Speech recognition systems and methods further may be 

classified as to the lowest phonetic unit that they 
identify. Every system is provided a set of spectral 
reference patterns for each of the lowest phonetic units 
to which the incoming speech signal is compared to seek a 

35 best match for identification. The largest such unit is 
a whole word (of small group of words) . Systems operating 
with reasonable accuracy at this phonetic level generally 
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are limited to small-vocabulary, discrete-speech 
applications. Methods which aim to identify phonemes for 
aggregation into words are represented in the prior art 
across classes of speaker-dependent and 
5 independent/discrete- and continuous- speech systems. 
Difficulty in achieving reliability is encountered in such 
systems as larger vocabularies introduce more similar 
sounding words and as multiple speakers introduce 
different pronunciations of the same words. Methods of 

10 identifying phones (i.e., sub-phoneme units of speech) aim 
to achieve improved reliability by identifying more but 
smaller segments of the speech signals. The present 
system described herein includes a method for accurately 
identifying phones. 

15 Speech recognition systems and methods may be still 

further classified according to their modelling of the 
speech process. Some methods describe the process as a 
series of acoustic events. This model has been applied 
primarily to phoneme recognition. In such a model, the 

20 speech signal is first segmented into occurrence of 
classes of phonemes such as vowels (/IY/, /OW/, etc.), 
fricatives (/F/, /S/,etc), stops (/D/, /T/, etc.) and so 
forth. Then the specific phoneme within the class is 
identified. A second model takes the view that it is not 

25 possible to analyze the speech process directly but that 
it can be usefully analyzed in statistical terms. The 
Hidden Markov Model is an example of this view of the 
speech process. In this model, segments of the speech 
signal are considered as (spectral) states of the system 

3 0 and transitions from one state to any other are 
probabilistic. Each phoneme or phone is described in 
terms of a sequence of state changes . The probabilities 
of transition between spectral states of an incoming 
speech signal are calculated to determine probable 

35 correspondence, to each of the target sequences to 
determine probable phoneme or phone identification. It is 
difficult to achieve high reliability with this method in 
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large vocabulary speaker- independent systems because of 
the much larger number of possible spectral states 
compared to the number of spectral states in a speaker- 
dependent system. A third model views the speech signal 
5 as a sequence of spectral patterns; i.e., a directly 
observable representation of the signal. This is the 
model that is employed in the present invention as will be 
described in detail later. 

All speech recognition methods are based on comparing 

10 the characteristics of the unknown speech signal with a 
reference set of examples to determine when a "good" match 
occurs (identification) . Thus, another way of classifying 
speech recognition systems and methods is on the basis of 
how the reference data is derived to which the unknown 

15 speech signal is compared for identifying a word, phoneme 
or phone. In a "rules are given" system, the system 
designer provides the machine directly the reference data 
to be used for determining best matches. The designer 
devises the shapes of the templates or calculates the 

20 state transition probabilities as in a Hidden Markov Model 
approach. Speaker- independent applications result in a 
need for a large number of spectral states to accommodate 
the wide variations in speaker's voices. Spectral states 
that are similar may be aggregated but at some loss of 

25 representational accuracy and hence reliability of 
identification. In a "rules are learned" system (e.g., an 
artificial neural network) , the designer provides the 
system with a very large number of examples of spectra of 
each phone of interest and their identification. The 

30 system is run in a training mode and the neural network 
"learns" how to distinguish one phone from all the others. 
When run in an application, the neural network determines 
the probability that the segment of signal encountered is 
each of the phone possibilities. Selection is made when 

35 specified probability threshold criteria are met. This is 
the method used in the present invention. An important 
advantage of this approach in speaker- independent 
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applications is that its reliability can be improved with 
the number of speakers using it. 

A final way of classifying speech recognition systems 
relates to the aids to word identification employed, if 
5 any. In a "context-free" scheme, the string of phones or 
phonemes are compared to lexicon or dictionary entries to 
identify each word directly. In a "context-assisted" 
scheme, devices such as allowable word pairs, constrained 
grammar and/or syntax and the like are used to improve 

10 reliability of word identification. The present invention 
is context-free. 

Most of the speech recognition methods described by 
the prior art can be modified for application to other 
languages. However, those methods that depend on 

15 analytical devices such as allowable word order, grammar 
and/or syntax to assist in word identification require 
separate and duplicative effort for cross -language 
implementation. In an era of global communication and 
commerce, there is a need for a language -independent 

20 system that has not heretofore been addressed by the prior 
art . The design and implementation of such a system will 
exploit the overlap in the speech sounds used in different 
languages. Exploitation of the common usage of sounds 
between languages requires application of a more detailed 

25 understanding of speech production and the resultant 
speech signal than has been the case in the prior art. 

Summary of the Invention 

The prior art has not taught the construction of 
devices with the capability to mimic human capability to 

3 0 recognize phones; i.e., "a speech sound considered as a 
physical event .without regard to its place in the sound 
system of a language." (Webster's Ninth New Collegiate 
Dictionary; Merriam-Webster Inc., Publishers; Springfield, 
MA; 1991) "Human languages display a wide variety of 

35 sounds, called phones or speech sounds. There are a great 
many speech sounds, but not an infinite number of 
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them. . . .The class of possible speech sounds is not only- 
finite, it is universal. A portion of the total set will 
be found in the inventory of any human language . " 
(Contemporary Linguistics: An Introduction; William 
5 0' Grady, Michael Dobrovolsky, Mark Aronoff; St. Martin's 
Press; New York; 1989) . 

It is an object of my invention to provide a system 
and method for recognizing the total set of speech sounds 
(or phones) in human languages. 

10 It is another object of my invention to provide a 

system and method for transcribing the speech of arbitrary 
speakers in one of many languages including when such 
speech is continuous and conversational. 

It is yet another object of the present invention to 

15 provide a system and method for processing the speech 
signal to yield an accurate determination of all the 
frequencies contained in that signal and their amplitudes. 

It is a further object of the present invention to 
emulate the human hearing processes to provide a system 

20 and method for unique direct observation of the perceived 
speech signal at very short time intervals. 

It is yet another object of the present invention to 
address the phones in a language as fuzzy sets; i.e., as 
all speech signals having a probabilistic membership in 

25 all phone sets. 

It is a still further object of the invention to 
provide an artificial neural network system and method for 
determining the probable phone represented during each 
very short time interval . 

3 0 It is a further object of the invention to provide a 

unique method of employing the artificial neural network 
to identify the time during the utterance of a phone which 
represents the closest approach of the vocal tract 
configuration to a target position; i.e., when there is 

35 the maximum likelihood of the signal representing the 
intended phone. 
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It is another object of the invention to provide a 
method for accommodating multiple pronunciations of the 
same word. 

It is yet another object of the invention to provide 
5 a method of separating words that often are run together 
in conversational speech due to coarticulation. 

It is still another object of the invention to 
provide a method of exploiting the common usage of some 
phones between languages so that the inclusion of other 
10 languages is efficiently accomplished with the time 
required for each new language decreasing with the number 
of languages included. 

Exploitation of the common usage of sounds between 
languages requires application of a more detailed 
15 representation of speech production, the resultant coding 
of the speech signal, and emulation of the neuro- 
physiological mechanisms of hearing and pattern 
recognition that decode that signal to allow speech 
recognition than has been the case in the prior art. The 
20 present invention emulates the concurrent processes 
occurring in humans recognizing speech; i.e., spectrum 
analysis, speech sound identification and word 
recognition. The frequency response and sensitivity of 
human hearing is mimicked, an artificial neural network is 
25 included to represent the pattern recognition apparatus of 
the brain and logical processes are included to emulate 
our translation of spoken sounds into written words. 

These and other objects and features of the present 
invention will be better understood through a 
30 consideration of the following description taken with the 
drawings in which: 

Figure 1 is a logical diagram of the system. 
Figures 2A-2C are illustrations of a simplified 
source-filter decomposition of a voiced, sound. Figure 2A 
35 is a typical source spectrum, 2B is a representative vocal 
transmission filter function, and 2C is a spectrum of a 
radiated vowel. 

SUBSIHUTE SHEET (RULE 26) 
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Figure 3 is a graph of frequency discrimination 
versus frequency and loudness of a tone. 

Figure 4 shows the relative response of some narrow 
bandpass filters versus frequency. 
5 Figure 5 shows frequency response of human hearing in 

terms of the intensity of various frequencies required to 
produce the same perceived loudness. 

Figures 6A-6C illustrates three different concepts of 
speech segmentation. 
10 Figures 7A-7E show some estimated articulatory 

positions assumed during pronunciation of the word "caw". 
Figure 7A is the articulatory position for the phoneme 
/K/, 7E is the position for the /AO/, and 7B, 7C, and 7D 
are some estimated transition positions between the two. 
15 Figure 8A presents some typical high resolution 

spectra for the vowel /AH/ and Figure 8B presents some 
spectra for the vowel /OW/. 

Figure 9 is a schematic drawing of an artificial 
neural network phone identifier. 
20 Figure 10A-10C are a high resolution spectrograms for 

a particular word sample spoken by a particular speaker 
shown in three parts for convenience. 

Figure 11a through lid show a sample output of an 
artificial neural network phone identifier. 
25 Figure 12 is a logical diagram of the phonemic-to- 

spoken language translation program. 

Figures 13A-13B illustrate an implementation of the 
present invention on a currently available microcomputer. 
Figure 13A is a side view of the computer and Figure 13B 
30 is a rear view of the computer. 

Description of the Invention 

Figure 1 is a logical diagram of the system. It 
includes a language selector 1, language modules "2 stored 
in non-volatile memory and concurrent processors 3-7 each 
35 of which operates on the transformation of the speech 
signal provided by the previous process. Each language 
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module 2 comprises, for a pre-determined language, the 
weights for the neural network 8 to be solved for each 
interval of time and a language dictionary 9 containing 
the phonemic -to- spoken language translation of the 
5 vocabulary words provided. At start-up, the language 
selector 1 displays a menu of stored languages from which 
the user selects the one of interest. It retrieves from 
storage and passes the neural network parameters and 
weights for that language to the neural network phone 

10 identifier 5 and the appropriate language dictionary 9 to 
the phoneme string translator 7. 

Continuous speech signals then are input into a 
conventional analog-to-digital converter 3 and thence to 
the spectrum analyzer 4 which operates on the digitized 

15 signal concurrently with the analog-to-digital converter 
processing subsequent signals. The spectrum analyzer 4 is 
itself a parallel processor as will be described in detail 
later. The output of the spectrum analyzer 4 is sent to 
the neural network phone identifier 5 where a phoneme, 

20 allophone or other legitimate speech sound in the language 
is identified (if a phoneme, allophone or other legitimate 
speech sound is present) . This operation takes place 
concurrently with the analog-to-digital converter 3 and 
the spectrum analyzer 4 processing further subsequent 

25 speech signals. The output of the neural network phone 
identifier 5 is passed to the phoneme integrator 6 where 
various tests are made to ensure that real phonemes, 
allophones and other legitimate speech sounds in the 
language are separated from fleeting transitions between 

30 them and to combine the allophones and other legitimate 
speech sounds into phonemes. As before, the phoneme 
integrator 6 is operating on its portion of the speech 
signal concurrently with the neural network phone 
identifier 5, the spectrum analyzer 4 and the analog-to- 

35 digital converter 3 processing later incoming portions of 
the speech signal. As the integration of each phoneme is 
completed, it is sent to the phoneme string translator 7 
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where it is added onto the end of the existing phoneme 
string. When there are a pre -determined minimum number of 
phonemes in the string, the phoneme string translator 7 
accesses the language dictionary 9 to parse the string 
5 into the words spoken in the speech stream. Each of the 
parts of the system now will be described in detail. 

Analog-To-Digital Converter 

A speech signal is input from a source such as 
telephone, microphone or tape recorder and is digitized by 
an analog-to-digital converter 3 . In the preferred 
embodiment, the speech recognition system disclosed herein 
digitizes the incoming signal at 8 KHz and incorporates an 
anti-aliasing lowpass filter whose response is 
approximately 60db down at 4000 Hz from its response from 
0 to 3800 Hz. In accordance with the current art, the 
lowpass filter may be of the analog variety operating on 
the input signal prior to digitization or a digital filter 
applied after digitization of the analog signal . The 
output of the lowpass filter is passed to the spectrum 
analyzer 4. 

Spectrum Analyzer 

Before describing the spectrum analyzer 4, it is 
important to consider the nature of the signal to be 
analyzed. Fant in his book "Acoustic Theory of Speech 
25 Production" (Gunnar Fant; Mouton and Company; The Hague, 
The Netherlands; 1960) described the spectrum of a 
radiated speech sound as the product of a source spectrum 
and a vocal transmission filter function as shown in 
Figures 2A-2C. The source spectrum is the result of the 
30 vibrating vocal cords producing a fundamental frequency 
and its harmonics which decline in amplitude at 6db per 
octave. The fundamental frequency can range from a low of 
about 60 Hz for a man with a bass voice to almost 400 Hz 
for a child. The "filter function" results from the 
35 shaping of the vocal tract to produce a particular speech 




15 
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sound. In prior art utilizing linear predictive coding to 
describe a speech sound, the object of investigation has 
been the filter function. However, the ear receives the 
entire radiated speech sound, not just the filter 
5 function. The linear predictive coding process both 
distorts the speech signal and discards some of the 
information it contains. The present invention employs an 
artificial neural network to identify speech sounds; 
therefore, it was considered advantageous to retain as 

10 much signal information as possible by emulating the human 
hearing process . 

A number of approaches have been utilized in the 
prior art to simulate human response to speech sounds for 
example as in Pat. No. 4,905,285 to Allen et al ("Analysis 

15 Arrangement Based on a Model of Human Neural Responses") 
and Pat. No. 4,436,844 to Lyon ("Method and Apparatus for 
Simulating Aural Response Information"). In both these 
examples of the prior art, the aim is to simulate the 
output of the cochlea. The present invention addresses 

20 the problem not solely as one of simulating the output of 
the cochlea but one of further representing the speech 
signal as it is perceived by the brain. For this purpose, 
it is necessary to provide an arrangement of pseudo-hair 
cells providing both the frequency discrimination 

25 capability and frequency response of human hearing as 
determined by auditory testing. The results of one such 
set of tests of frequency discrimination are illustrated 
in Figure 3 from "Hearing, Taste and Smell"; by Philip 
Whitfield and D. M, Stoddard; Torstar Books; New York; 

30 1985. Figure 3 shows that human ability to discriminate 
between two closely spaced tones is dependent on both the 
amplitude and frequency of the signal. Higher frequency 
tones must be spaced further apart for discrimination and 
higher amplitude ones can be discriminated better than can 

35 lower ones. 

In order to obtain a representation of the radiated 
spectrum of the speech signal comparable to human auditory 
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perception, the preferred embodiment of the present 
invention employs a plurality of very narrow bandpass 
filters spaced from 58 to 3800 Hz according to the lOdb 
. sound level (upper) curve of Figure 3. Some people with 
5 very good hearing still have good speech perception at 
this signal level. This results in a bank of 420 filters 
spaced approximately 4 Hz apart beginning at the lowest 
frequencies increasing to approximately 24 Hz between 
adjacent filters at the highest frequencies. While this 

10 many filters may present a computational challenge to real 
time operation, it is noted that this is a relatively 
small number compared to the. approximately 10-12,000 hair 
cells of the cochlea over the same frequency range. 

Figure 4 is a graphic illustration of a portion of 

15 the filter arrangement around 100 Hz. It can be seen from 
Figure 4 that because of the filter spacing of about 4 Hz, 
the true frequency of any signal in this frequency range 
will be within about 2 Hz of the reported frequency. It is 
understood that better frequency resolution may be 

20 obtained by increasing the number of filters such as by 
using the frequency discrimination of higher loudness 
levels in Figure 3. It is also noted that satisfactory 
phone recognition might also be obtained with somewhat 
less frequency resolution; i.e., greater spacing between 

25 filters. 

The output of each of the bandpass filters is 
computed for each sample. At an 8 KHz sampling rate, the 
spacing between samples is .125 ms. Modern digital signal 
processing chips arranged in parallel can provide the 

30 processing power required for real-time operation. For 
example, Loral Space Information Systems has developed an 
arrangement of five C-programmable Texas Instruments 
TMS320C30 DSP chips on two plug-in boards (marketed by 
California Scientific Software as . the BrainMaker 

35 Professional Accelerator) that can provide adequate 
computing speed to solve several hundred filters in real 
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time. Alternatively, more compact integrated circuits can 
be specially designed for the purpose. 

The maximum absolute amplitude of each frequency band 
is determined over a short time interval. The length of 
5 that interval is a balance between shortness for accuracy 
in representing the dynamics inherent in speech patterns, 
and length to accurately reflect the amplitude of low 
frequencies. The duration of one complete wave of a 1 KHz 
tone is 1 ms. One wave of a 500 Hz tone is 2 ms and that 

10 of a 250 Hz tone 4 ms. However, a half -wave of 125 Hz 
tone, the pitch of a typical male voice, is also 4 ms and 
will contain the maximum value attained in the full wave. 
In the preferred embodiment of the invention, a constant 
interval of 4 ms is employed over which to evaluate the 

15 maximum absolute value of the amplitude of each frequency 
band. A longer time period could be used but the presence 
of lower frequencies does not appear to contribute 
significantly to intelligibility. Likewise, shorter 
intervals could be used for higher frequencies, thereby 

20 achieving greater accuracy in the time domain for those 
frequencies. The additional complexity resulting may be 
tolerated for some speech analysis applications but was 
not considered cost-effective in the preferred embodiment. 
The output of the spectral analysis filter 

25 arrangement is a representation of the speech signal 
leaving the vocal tract. However, it is well-known that 
human hearing does not have a flat frequency response. It 
is considerably less sensitive to the lowest frequencies 
in the speech spectrum than the higher ones. Figure 5 

30 illustrates the relative sound level intensity required 
for perceived equal loudness. Referring to the "10 
Loudness Level (phons) " curve of Figure 5, it can be seen 
that a signal of about 3.0db greater sound pressure level 
is required for a 100 Hz signal to produce the same 

35 perceived loudness as a 1000 Hz signal. The present 
invention modifies the output of the filter bank to 
compensate for the frequency response of the ear. In the 
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preferred embodiment, each of the outputs of the bandpass 
filters is multiplied in the spectrum analyzer 4 by the 
inverse of the "10 Loudness Level (phons) " curve of Figure 
5. This increases the amplitude of the higher frequencies 
5 relative to the lower ones. It can be seen that this has 
the effect of somewhat compensating for the phenomenon of 
the amplitudes of the pitch harmonics declining at 6db per 
octave as discussed previously. 

Neural Network Phone Identifier 

10 The neural network phone identifier 5 receives the 

output of the spectrum analyzer 4 and inputs it into its 
main processor, an artificial neural network that has been 
trained to identify the speech sounds or phones which make 
up the speech stream. The artificial neural network is 

15 trained by a method described in detail below to recognize 
not only phonemes but all legitimate speech sounds in a 
language including such sounds as murmurs occurring before 
a nasal like "M" and "N" , and allophones (or variants) of 
phonemes; e.g., as is well-known by those skilled in the 

20 art of phonetics the acoustic spectrum of the "Z" 
occurring at the beginning of a syllable is often 
different from that of a "Z» occurring before a silence. 
While it is generally accepted that there are only about 
40 to 45 phonemes in American English, there are over a 

25 hundred different sounds in the language as just 
illustrated. The term phone is used herein to refer to 
all such legitimate speech sounds. 

The present invention makes use of a fuzzy set 
concept of phones. In this concept, every sound made 

3 0 during speech has a probabilistic membership in all fuzzy 
phone sets. However, it is only when a particular sound's 
probability of being in a given set is sufficiently high 
and it's probability of being in any other set is 
sufficiently low is it labelled by the system as belonging 

3 5 to the given phone set. The differences between this 
concept and other concepts used in prior art is 
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illustrated in Figure 6A-6C. In Fig. 6A, all phones (or 
phonemes) in a speech stream are contiguous; i.e., where 
one phone (or phoneme) ends, the next one is considered to 
begin. Furthermore, all sounds in the stream are a part 
5 of some phone (or phoneme) . In Fig. 6B, a sound can be 
part of a phone (or phoneme) or it can occur during a 
transition from one phone (or phoneme) to the next. 
However, the occurrence of a phone (phoneme) is a discrete 
event; the sound either is or is not a phone (or phoneme) 

10 -- the probability is either zero or one. 

Fig. 6C illustrates that sounds in the speech stream 
can have a probabilistic membership in more than one phone 
fuzzy set. This follows from the fact that the vocal 
tract is a variable configuration mechanical device that 

15 is constantly being re-shaped to produce the desired 
sound. There is a unique target position of the vocal 
tract for each phone. During speech, sounds are 
continually being produced as the vocal tract is 
reconfigured to successive positions. Figures 7A-7E are 

20 illustrations of estimated articulatory positions during 
pronunciation of the word "caw". Fig. 7A is the estimated 
target position for the phoneme /K/ and Fig. 7E is the 
estimated target position for the phoneme /AO/ ("A Course 
in Phonetics"; Peter Ladefoged; Harcourt Brace Jovanovich 

25 College Publishers; Fort Worth, Texas; 1993). Figs. 7B-7D 
are some estimated positions of the vocal tract assumed 
during transition between the two target positions. It is 
clear that as the vocal tract shape is going away from the 
target position for the /K/, the sound produced will be 

30 less and less like that of the /K/ . Likewise, as the 
shape approaches that of the /AO/, the sound produced will 
be more and more like that of the /AO/. In between the 
two target positions, the sounds will have varying 
similarities to the two target phonemes and indeed may 

35 have some similarities to other phones. 
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The artificial neural network is trained by a method 
described in detail below to identify when a phone is 
represented by the sound occurring in each 4 ms interval. 
It does this by solving the matrices representing the 
5 network weights applied to the spectral input and 
computing the probabilities that the sound represents each 
of the phones. If the probability for one of the phones 
exceeds a specified threshold, and the probabilities for 
all other phones are less than one minus that threshold, 

10 then the signal in that interval is identified as the 
phone exceeding the threshold. In one embodiment of the 
invention, the BrainMaker Professional neural network 
software produced by California Scientific Software is 
used both for the training and solution of networks. 

15 Other mechanisms for solving neural networks are available 
such as specialized neural chips with the result that 
alternative designs for implementing the invention in 
hardware are possible. 

Artificial neural networks have been applied 

20 successfully to a variety of pattern recognition and 
correlation tasks. Methods of configuring, training and 
solving artificial neural networks are well-known to those 
skilled in the art. In order to apply one effectively to 
phone recognition, methods of providing it information 

25 necessary and sufficient for it to be able to recognize 
the speech sounds of an arbitrary speaker are required. 
Two conditions must be satisfied for accurate recognition. 
First, the description of the speech signal presented to 
the artificial neural network (for training and 

3 0 recognition) must be of sufficiently high resolution to 
allow it to distinguish between phones in a relatively 
crowded speech band. And second, the network must have 
previously been trained with the speech samples of a 
sufficient number and diversity of .speakers of the 

35 language to ensure that the speech patterns on which it is 
trained are representative of the speech patterns of the 
full population. The spectrum analyzer 4, being designed 
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to provide resolution response and resolution similar to 
that of human hearing, satisfies the first condition. 
Regarding the second condition, the empirical results 
obtained in training the neural network phone identifier 
5 5 for reducing this invention to practice show that speech 
samples from hundreds of speakers are required to achieve 
adequate coverage of male and female speakers with low to 
high pitch voices and a wide range of individual 
linguistic mechanisms. The numbers of different speakers 

10 required is discussed further below in connection with 
training the neural network. Figures 8A and 8B show the 
spectra of a few of the hundreds of examples of the vowels 
"AH" (as in "nut") and "OW" (as in "note") presented to 
the ANN for training. As can be seen from Figures 8A-8B, 

15 there is not only a great range of variation within a 
given phone but a great deal of similarity between the two 
phones . 

Artificial neural networks typically have an input 
layer of neurons, an output layer and one or more hidden 

20 layers. A schematic diagram of the preferred embodiment 
of the phone recognition network is shown in Figure 9. 
The output layer of neurons is simply each of the phones 
of the spoken language. The input layer is spectral data 
for the current time interval and a previous one. As 

25 shown in Figure 9, the first neuron represents a 
measurement of the speech signal input level. The 
remaining neurons are two sets of input data which capture 
the rapidly changing dynamics of some phones such as stops 
by describing the signal spectrum at a previous interval 

30 and at the current one. The separation between the two 
intervals is selected to emphasize the differences in the 
spectra. In the preferred embodiment, the separation is 
32 ms. It is understood that the Optimal separation may 
be different for different languages and even for 

35 different dialects and regional accents in a given 
language. In each of the two sets, , the first .neuron gives 
the maximum amplitude of any frequency occurring in that 
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time interval and the remaining ones describe the signal 
spectrum relative to that maximum. As indicated 
previously, an artificial neural network may incorporate 
one or more hidden layers of neurons. Those skilled in 
5 the art of artificial neural network construction will 
recognize that no dependable theories or rules-of -thumb 
have yet been devised to determine either the optimum 
number of hidden layers or the optimum number of neurons 
in a hidden layer. ' In accordance with standard practice 

10 in the field, the number of neurons in the hidden layer (s) 
is determined empirically by testing hidden layers with 
different numbers of neurons to identify the one yielding 
the best performance in terms of accuracy in correctly 
identifying the phones in the speech signals of speakers 

15 not included in the population of those on which the 
network was trained. 

Training the Neural Network 

Training the neural network includes preparing data 
to represent the speech characteristics of as much of the 

20 expected user population as possible. Speech samples are 
recorded using sets of words to be uttered that contain in 
each set one or more examples of each of the specific 
phones desired. One way of training a system for the one 
hundred plus phones in American English is to train the 

25 neural network on individual sets of approximately ten 
phones each and combining those sets into larger and 
larger training sets. It is important to include speakers 
in each training set whose collective voices span the 
range of pitch frequencies of voices expected to be 

30 encountered in the application. For example, if only 
men's voices are needed, a range from about 60 to about 
150 Hz should be adequate; if only women's voices are 
needed, a range of about 130 to 350 Hz will be required. 
If children's speech also is to be recognized, the range 

35 will be extended perhaps as high as 400 Hz. 
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It is important to have a more or less uniform 
distribution of numbers of pitch voices over the desired 
range. The preferred embodiment of the disclosed 
invention has approximately forty frequency bands over the 
5 range of voice pitches. It can be estimated statistically 
that about fifty different speakers for each voice pitch 
should yield high confidence of population representation. 
It will be observed in collecting speech samples for 
training the system that voice pitches will tend to 

10 cluster about certain frequencies in approximately normal 
distributions separately for men and women (and children 
also if included) . In collecting speech samples for 
training the proof -of -principle system of the present 
invention, it was found for that particular sample 

15 population that there were fewer men's voices between 60- 
100 Hz and 130-150 Hz than between those ranges. Likewise 
there were fewer women's voices in the 150-180 Hz and 250- 
350 Hz ranges than in between. It can be expected to find 
a surplus of mid- frequency pitches to be discarded and 

20 additional effort required to get a sufficient number of 
high and low pitch voices to achieve desired uniformity in 
pitch distribution. 

The most important part of the training process is to 
select the "best" times to represent each phone in a 

25 sample word; i.e., the times at which the probabilities 
are highest that the spectra belong to the fuzzy sets of 
the sample phones. Referring again to Figures 6A-6C, 
those times are the peaks of the curves for the 3 -phone 
word shown in Fig. 6C. It is extremely helpful in 

3 0 selecting the times to view the output of the spectrum 
analyzer in graphical form. Figures 10A-10C are a high 
resolution spectrograms for the word "KNOW" uttered by 
subject JA9. (It can be observed from the figure that the 
subject is probably a woman since the voice pitch is about 

35 180 Hz.) The duration of the displayed portion of the 
recording is 600 milliseconds; the figure is split into 
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three 200ms parts for convenience of display. Each tick 
mark at the top edge of each part represents 20 ms. 

Both the murmur before, and the weak plosive release 
of, the phone "N" around 300 ms are clear. Thus the 
5 selection of the "best" times for these phones is 
facilitated. Selection of best times for other phones 
such as vowels may not be so clear cut. This subject, 
like many others from whom speech samples were taken, 
inserted the phoneme "AH" (as in "nut") between the "N" 

10 and the "OW" so that the pronounced word was N:AH:0W. 
Thus the phoneme "OW" does not occur at around 480 ms as 
might be supposed from Figure 10 (and if one is not aware 
as phoneticians are that the phoneme /AH/ is frequently 
inserted) but instead around 576 ms. 

15 A representative output of the neural network phone 

identifier 5 for the sample word KNOW . JA9 is displayed in 
Figures 11A-11D. It can be seen from Figures 11A-11D that 
at some times (such as around 200 ms) the signal has a 
significant probability of belonging to more than one 

20 phone set as was discussed in connection with Figure 6C. 
Likewise note the increasing probability for the murmur 
(xN) before the N, then its probability decreasing while 
the probability of the N increases. Subsequently the 
probability of the N decreases while the probability of 

25 the AH increases, and then the probability of the AH 
decreases while the probability of the OW increases. 

The times . selected initially for the thousands of 
phone examples in a given training set perhaps will not be 
the ones representing the times of maximum probability for 

3 0 at least some of the phones. During training, the neural 
network is looking for consistent patterns. Therefore, 
after training, the trained neural network should be 
applied to the sample words and significant differences 
between the phone input times and those identified by the 

3 5 neural network as being the highest probability times 
spotted. The non-optimum sample times then can be 
adjusted and the training repeated. This process should 
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be iterated until the differences reach an acceptably low 
level. In addition, testing of the system on new subjects 
after the system is trained may result in low 
probabilities of phone recognition for some speakers. The 
data for such speakers can be fed back into further 
training of the system to improve performance. 

This same technique is used when training the system 
for a new language. Speech samples from speakers of the 
new language are tested using the existing trained network 
in order to identify those phones for which the system 
already gives satisfactory results versus those that need 
to be trained specifically for that language. It is 
understood that those phones in the new language that are 
not common to a previous language will have to trained on 
speech samples from the new language. 

Phoneme Integrator 

The artificial neural network identifies which phone 
(if any) occurs in each time interval. However, some 
phonemes such as vowels are of sustained duration. One 
20 function of the phoneme integrator 6 is to separate 
legitimate phones from non-phonetic transitions by 
imposing a requirement for a pre -determined minimum number 
of consecutive identifications to confirm recognition. In 
the preferred embodiment of the disclosed invention, eight 
25 consecutive identifications (equivalent to 32 milliseconds 
duration) is required to confirm recognition of a vowel, 
three consecutive identifications for semi-vowels and 
fricatives and only one for stops and other plosives. 
Another of its functions is to ensure that both a murmur 
30 phone (of sufficient duration) and a release phone are 
present for phonemes such as voiced stops before 
recognition is confirmed. The output of the phoneme 
integrator is the phonemic representation of the speech 
stream. 
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Phoneme String Translator 

The function of the phoneme string translator 7 is to 
identify, separate and display (or output to a file) the 
spoken language words represented by the phoneme string. 
5 The major components of the translator are a phonemic- 
spoken language dictionary and a computer program that 
uses that dictionary to convert the. phoneme string into 
words spelled in the spoken language. An important 
feature of the dictionary is the use of multiple phonemic 

10 entries for many of the natural words. This is rendered 
necessary because (a) people with different accents often 
pronounce a given word differently and (b) transitions 
from one phone to another are sometimes a third phone. An 
example of (a) is the often different pronunciation of the 

15 word "harbor" by natives of the Northeastern United States 
compared to those in the Midwest. An example of (b) is 
the frequent transitional "AH" between an "N« and an "OW" 
and the insertion of a "W" between an "OW" and an "AH" so 
that the word "Noah" can have at least the phonemic 

20 spellings of /N:0W:AH/, /N:AH:OW:AH/, /N:OW:W:AH/ AND 
/N:AH:OW:W:AH/. The phonemic -spoken language dictionary 
has, and uses, all these entries to separate the phoneme 
string into spoken language words. 

The computer program design is based on identifying 

25 words in the context of a longer string of phonemes and to 
specifically address and account for co-articulation 
effects such as gemination. Before describing the program 
it is useful first to identify a frequently occurring 
phonetic situation not addressed in the prior art. When 

30 one spoken word ends in a given phoneme, especially a stop 
or a fricative,, and the next word begins with the same 
phoneme, the two phonemes are seldom enunciated 
separately. Identifying the location of word separation 
is made more complex for a speech recognition system than 

35 when such a situation does not obtain. For example, the 
utterance "bad dog" can not be properly separated without 
factoring in gemination of the ending and beginning "d" . 
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Otherwise the alternatives are "bad og" and "ba dog"; 
neither of which identify both words correctly. In a 
small vocabulary application, such a situation may be 
avoided by restricting words included in the lexicon but 
5 can not be in the unlimited vocabulary application for 
which this invention is intended. It is noted that there 
are numerous phonemes that are gemination candidates 
including all of the stops and fricatives and some of the 
affricates. 

10 The computer program is designed to anticipate 

possible gemination occurrences. A logical diagram of the 
computer program is shown in Figure 12 . The approach 
involves using a phoneme string longer than any single 
word likely to be encountered. The preferred embodiment 

15 of the invention is based on a 20 phoneme string length 
(called MaxString in procedure 10 of Figure 12) . The 
first 20 phonemes in an utterance (or the actual length if 
the utterance is less than 20 phonemes long) is examined 
in procedure 11 to find the longest possible first word. 

20 If that word does not end in a gemination candidate, it is 
output in procedure 16, the next phoneme becomes the new 
starting point in procedure 17, the 20 phoneme length is 
replenished in procedure 10, and the process repeated. If 
the longest first word does end in a gemination candidate, 

25 procedure 13 extends temporarily the MaxString by a number 
of phonemes equal to the number of phonemes in the test 
word, then procedure 14 determines whether there is a 
following word in the extended MaxString. This indicates 
that the phoneme following the gemination candidate was 

30 not co-articulated with the last phoneme in the preceding 
word. If there is a following word, procedure 16 outputs 
the test word, the next phoneme becomes the new starting 
point in procedure 17, the 20 phoneme length is 
replenished in procedure 10, and the process repeated. If 

35 there is not a second word commencing after the test word 
(indicating that co-articulation has occurred) , procedure 
15 inserts a duplicate of the co-articulation candidate 
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phoneme at that point. As before, procedure 16 outputs 
the test word, the next phoneme becomes the new starting 
point in procedure 17, the 20 phoneme length is 
replenished in procedure 10, and the process repeated. 
5 This set of procedures is repeated as long as there are 
phonemes produced by the phoneme integrator 6 . 

It should be noted that although the basic design of 
the system described above assumes that the user normally 
will select a specific language to be transcribed prior to 

10 use, the system can be modified to automatically determine 
which of the languages within its repertoire is being 
spoken and to select the appropriate artificial neural 
network and language dictionary for use. This can be 
accomplished by processing a brief initial portion of the 

15 speech, say 5 to 10 seconds in duration, through each of 
the languages to identify the language that produces a 
string of real words. The language for which the system 
identifies a string of real words is selected and the 
system operates from that point on as described above. 

20 System Implementat ion in Hardware 

The method and system disclosed herein may require 
concurrent processing for real time operation unless 
implemented on a "super computer"; however, it is intended 
primarily for widespread use and the preferred 

25 implementation is on a "personal computer" or 
"workstation" class of machine. While the equipments of 
several manufacturers may have suitable characteristics 
for some of the various components of the system, a 
particular arrangement as shown in Figure 13A and 13B will 

30 be described for purposes of illustration. As mentioned 
above, Loral Space Information Systems has developed an 
arrangement of five C-programmable Texas Instruments 
TMS320C30 DSP chips on two plug-in boards. 105 and 106 that 
can provide adequate computing speed for solving the 

35 equations for several hundred narrow bandpass filters in 
real time. A second set of boards 103 and 104 can be 
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dedicated to solving the neural network equations. These 
two sets of boards can be installed for example in a 
Compaq SystemPro Model 66M microcomputer which has 
provision for two independent processor boards 110 and 111 
5 that can share the same memory installed on boards 108 and 
109. One of these processors accomplishes the phoneme 
integration 6 function while the other serves as both as 
the control processor for language selection and to 
provide Phonemic- to- Spoken Language Translation and text 

10 output. Another plug-in board 107 such as the Media 
Vision Pro Audio Spectrum 16 can provide the analog-to- 
digital conversion function and its accompanying software 
can support waveform display and editing for assembling 
speech samples for language training and testing. The 

15 SystemPro computer has two remaining empty slots 
available . 
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Claims 

1. A mult i- language speech recognition system 
comprising 

an analog-to-digital converter for converting speech 
5 sounds into digital information, 

an analyzer for receiving said digital information 
and determining, with the frequency discrimination and 
frequency response of human hearing, a spectrum of said 
speech sounds, 

10 a phone identifier for receiving said spectrum from 

said spectrum analyzer, said phone identifier comprising 
a network for identifying which phones, if any, occur 
within a specified time interval of said spectrum, said 
network being capable of recognizing phones of a 

15 predetermined language, 

a phoneme integrator for separating phones from non- 
phonetic transitions identified by said phone identifier 
by detecting a predetermined minimum number of consecutive 
identifications of said identified phones to confirm 

20 recognition, said phoneme integrator providing as an 
output a phoneme string representative of identified 
phones from said spectrum, and 

a phoneme string translator for identifying, 
separating, and displaying or storing to a file, the human 

25 language words represented by said phoneme string, said 
phoneme string translator comprising a phonemic - spoken 
language dictionary and a program that uses said 
dictionary to convert the phoneme string into text of the 
spoken language. 

30 2. The system of Claim 1, wherein said network is 

pretrained to recognize phones in any one of several given 
human languages . 

3. The system of Claim 1, wherein said system 
accommodates multiple pronunciations of a word. 
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4 . The system of Claim 1 , wherein said system 
transcribes continuous conversational speech of an 
arbitrary speaker into one of several languages. 

5. The system of Claim 1, wherein said network 
5 identifies a time frame within said spectrum which most 

closely approaches a vocal tract configuration of a target 
position of a given phone. 

-6. The system of Claim 1, wherein said system 
addresses and accounts for coarticulation effects such as 
10 gemination which occur in conversational speech. 



7. A method for multi- language speech recognition 
comprising the following steps 

receiving an analog speech sound input and converting 
said input into a digital output, 
15 receiving said digital output and determining, with 

the frequency discrimination and frequency response of 
human hearing, a spectrum of said speech sounds, 

receiving said spectrum from said spectrum analyzer, 
and identifying which phones, if any, occur within a 
20 specified time interval of said spectrum, by comparing 
said spectrum with information in a network to recognize 
phones of a predetermined language, 

separating phones from non-phonetic transitions 
identified by said comparison by detecting a predetermined 
25 minimum number of consecutive identifications of said 
identified phones to confirm recognition, and providing as 
an output a phoneme string representative of identified 
phones from said radiated spectrum, and 

identifying, separating, and displaying or storing to 
30 a file, the human language words represented by said 
phoneme string, by use of a phoneme string translator 
comprising a phonemic -spoken language dictionary and a 
program that uses said dictionary, to convert the phoneme 
string into text of the spoken language. 



WO 95/02879 



PCT/US94/07742 



28 

8. The method of Claim 7, wherein said network is 
pretrained to recognize phones in any one of several given 
human languages . 

9. The method of Claim 7, wherein said method makes 
5 efficient use of common phones which exist in various 

human languages when adding additional language 
capabilities to said method. 

10. The method of Claim 7, wherein said method 
accommodates multiple pronunciations of a word. 

10 11. The method of Claim 7, wherein said method 

transcribes continuous conversational speech of an 
arbitrary speaker into one of several languages. 

12. The method of Claim 7, wherein said network 
identifies a time frame within said spectrum which most 

15 closely approaches a vocal tract configuration of a target 
position of a given phone. 

13 . The method of Claim 7 , wherein said method 
addresses and accounts for coarticulation effects such as 
gemination which occur in conversational speech. 

20 14. A mult i -language speech recognition system 

comprising 

means for receiving audio speech signals in a 
predetermined language and for converting them into 
corresponding electrical signals, 
25 an analog-to-digital converter for sampling said 

frequencies at a rate at least twice as high as a 
predetermined maximum frequency of interest in said 
signals, 

a spectrum . analyzer- for accepting sets of samples 
30 from said analog- to-digital converter over a time interval 
of between 1 millisecond to 8 milliseconds, and for 
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providing an analysis of the spectral content of each said 
set of samples simulating the frequency discrimination and 
sensitivity response characteristics of human hearing, 

an artificial neural network for identifying whether 
5 each said set of samples probably represents the audio 
spectrum of one of a predetermined set of phones belonging 
to said spoken language, 

an integrator for integrating sufficient 
predetermined minimum numbers of said probabilistic 
10 identifications of successive said sets of samples to 
confirm existence and recognition of said phones, 

means for integrating said phones into phonemes in 
said spoken language, 

a translator for translating sequences of said 
15 phonemes into words of said spoken language, 

said translator including means for (1) interpreting 
transitions between two legitimate speech sounds that are 
third legitimate speech sounds and (2) identifying an 
unpronounced speech sound when said speech sound is co- 
20 articulated with a neighboring speech sound, and 

means for displaying, printing and/or storing text 
corresponding to the translated words. 
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0.000R 
0.00OR 
0.000R 
0.000R 
0.000R 
0.000R 
0.000R 
0.002R 
0.003R 
0.000R 
0.002R 
0.000R 
0.000R 
0.001 R 
0.003R 
0.000R 
0.001R 
0.006R 
0.004R 
0.000R 
0.000R 
O.OOOR 
0.000R 
O.OOOR 
O.OOOR 
O.OOOR 
O.OOOR 
O.OOOR 
O.OOOR 
O.OOOR 
O.OOOR 
O.OOOR 
O.OOOR 
O.OOOR 
O.OOOR 
O.OOOR 
0.001 R 
0.01 9R 
0.029R 
0.022R 
0.006R 
0.045R 
0.171R 
0.1 20R 
0.207R 



0.009W 
0.006W 
0.001W 
0.000W 
0.000W 
0.000W 

o.ooow 
o.ooow 
o.ooow 

0.006W 
0.030W 
0.029W 
0.009W 
0.01 OW 
0.008W 
0.004W 
0.006W 
0.01 OW 
0.008W 
0.034W 
0.014W 
0.005W 
0.006W 
0.025W 
0.021W 
0.038W 
0.022W 
0.008W 
0.026W 
0.014W 
0.01 5W 
0.004W 
0.025W 
0.01 3W 
0.01 1W 
0.002W 
0.003W 
0.002W 
0.003W 
O.009W 
0.043W 
0.024W 
0.164W 
0.060W 
0.050W 
0.020W 
0.005W 
0.258W 
0.743W 
0.871W 
0.544W 
0.477W 
0.441 W 
0.742W 
0.407W 



O.OOOxN 
O.OOOxN 
O.OOOxN 
O.OOOxN 
O.OOOxN 
O.OOOxN 
O.OOOxN 
0.006xN 
0.042XN 
0.063XN 
0.049xN 
0.016xN 
0.009XN 
0.01 5xN 
O.OOIxN 
0.002xN 
0.008XN 
0.014xN 
0.01 3xN 
0.028xN 
0.045xN 
0.034XN 
0.068xN 
0.01 9xN 
0.007XN 
0.003xN 
0.017XN 
0.01 2xN 
0.004xN 
O.OOOxN 
O.OOOxN 
O.OOOxN 
0.002xN 
0.004xN 
O.OOOxN 
O.OOOxN 
O.OOOxN 
O.OOOxN 
O.OOOxN 
O.OOOxN 
0.004xN 
0.020xN 
0.007xN 
0.030xN 
0.005XN 
0.01 2xN 
0.118xN 
0.244xN 
0.208xN 
0.061XN 
0.01 3xN 
0.006XN 
0.003XN 
0.002xN 
0.109XN 



O.OOOZr 
O.OOOZr 
O.OOOZr 
O.OOOZr 
O.OOOZr 
O.OOOZr 
O.OOOZr 
0.114Zr 
0.621Zr 
0.41 6Zr 
0.231 Zr 
0.151Zr 
0.089Zr 
0.040Zr 
0.023Zr 
0.032Zr 
0.094Zr 
0.654Zr 
0.543Zr 
0.428Zr 
0.21 8Zr 
0.325Zr 
0.340Zr 
0.334Zr 
0.093Zr 
0.123Zr 
0.234Zr 
0.185Zr 
0.249Zr 
0.034Zr 
0.024Zr 
0.01 3Zr 
0.292Zr 
0.196Zr 
0.047Zr 
0.001 Zr 
0.003Zr 
0.01 2Zr 
0.004Zr 
0.01 6Zr 
0.041 Zr 
0.143Zr 
0.429Zr 
0.638Zr 
0.246Zr 
0.649Zr 
0.801Zr 
0.928Zr 
0.885Zr 
0.761Zr 
0.878Zr 
0.899Zr 
0.795Zr 
0.245Zr 
0.005Zr 



0.275ZC 
0.549ZC 
0.991ZC 
0.988ZC 
0.998ZC 
0.9987c 
LOOOZc 
0.247ZC 
0.093ZC 
0.41 6Zc 
0.833ZC 
0.91 7Zc 
0.951ZC 
0.9402c 
0.988ZC 
0.952ZC 
0.948Zc 
0.361ZC 
0.503ZC 
0.732ZC 
0.808ZC 
0.722ZC 
0.81 8Zc 
0.952ZC 
0.973ZC 
0.973ZC 
0.877ZC 
0.904ZC 
0.947ZC 
0.995ZC 
0.996ZC 
0.997ZC 
0.951 Zc 
0.912ZC 
0.972ZC 
0.997Zc 
0.986ZC 
0.931 Zc 
0.995ZC 
0.967ZC 
0.983ZC 
0.870ZC 
0.849ZC 
0.424ZC 
0.745ZC 
0.431ZC 
0.040ZC 
0.01 3Zc 
0.029ZC 
0.005ZC 
0.002ZC 
0.003ZC 
0.002ZC 
O.OOIZc 
0.001 Zc 



FIG. 11a 
SUBSTm/TE SHEET flNII F %\ 



WO 95/02879 



PCT/US94/07742 



224 .. 

228 .. 

232 .. 

236 .. 

240 .. 

244 .. 

248xN 

252xN 

256xN 

260xN 

264xN 

268xN 

272xN 

276xN 

280xN 

284xN 

288xN 

292xN 

296xN 

300N 

304N 

308N 

312N 

316N 

320N 

324N 

328.. 

332.. 

336.. 

340.. 

344.. 

348.. 

352.. 

356.. 

360AH 

364AH 

368AH 

372AH 

376AH 

380AH 

384AH 

388AH 

392AH 

396AH 

400AH 

404AH 

408AH 

412AH 

416AH 

420AH 

424AH 

428AH 

432AH 

436AH 

440AH 



0.000AH 
0.000AH 
0.000AH 
0.001AH 
0.001AH 
0.001 AH 
0.003AH 
0.005AH 
0.003AH 
0.004AH 
0.005AH 
0.004AH 
0.003AH 
0.004AH 
0.004AH 
0.003AH 
0.004AH 
0.005AH 
0.005AH 
0.005AH 
0.003AH 
0.008AH 
0.012AH 
0.014AH 
0.01 4AH 
0.01 9AH 
0.088AH 
0.060AH 
0.068AH 
0.124AH 
0.197AH 
0.21 2AH 
0.356AH 
0.493AH 
0.629AH 
0.640AH 
0.722AH 
0.830AH 
0.841 AH 
0.859AH 
0.910AH 
0.944AH 
0.948AH 
0.962AH 
0.966AH 
0.968AH 
0.957AH 
0.953AH 
0.958AH 
0.955AH 
0.951AH 
0.936AH 
0.958AH 
0.965AH 
0.953AH 



0.001 IY 
0.009IY 
0.013IY 
0.016IY 
0.018IY 
0.015IY 
0.009IY 
0.011IY 
0.009IY 
0.010IY 
0.007IY 
0.007IY 
0.006IY 
0.006IY 
0.004IY 
0.005IY 
0.008IY 
0.007IY 
0.006IY 
0.000IY 
0.001 IY 
0.000IY 
0.000IY 
O.OOOIY 
O.OOOIY 
O.OOOIY 
0.008IY 
0.149IY 
0.181IY 
0.108IY 
0.046IY 
0.023IY 
0.01 1IY 
0.006IY 
O.OOOIY 
0.003IY 
0.002IY 
0.001 IY 
0.001 IY 
0.001 IY 
O.OOOIY 
O.OOOIY 
O.OOOIY 
O.OOOIY 
O.OOOIY 
O.OOOIY 
O.OOOIY 
O.OOOIY 
O.OOOIY 
O.OOOIY 
O.OOOIY 
0.001 IY 
0.001 IY 
O.OOOIY 
0.001 IY 



0.002OW 
0.002OW 
0.002OW 
0.006OW 
0.006OW 
0.01 OOW 
0.01 50W 
0.01 90W 
0.020OW 
0.026OW 
0.029OW 
0.034OW 
0.033OW 
0.035OW 
0.040OW 
0.032OW 
0.048OW 
0.045OW 
0.032OW 
0.002OW 
0.002OW 
O.OOOOW 
O.OOOOW 
O.OOOOW 
O.OOOOW 
O.OOOOW 
0.01 30W 
0.048OW 
0.058OW 
0.079OW 
0.099OW 
0.047OW 
0.044OW 
0.037OW 
0.027OW 
0.027OW 
0.024OW 
0.021 OW 
0.01 70W 
0.01 70W 
0.009OW 
0.005OW 
0.005OW 
0.003OW 
0.004OW 
0.004OW 
0.007OW 
0.006OW 
0.008OW 
0.007OW 
0.01 OOW 
0.01 30W 
0.01 OOW 
0.01 30W 
0.01 70W 



0.005N 
0.006N 
0.020N 
0.046N 
0.070N 
0.052N 
0.063N 
0.066N 
0.079N 
0.1 17N 
0.1 04N 
0.098N 
0.101N 
0.1 25N 
0.1 03N 
0.147N 
0.096N 
0.093N 
0.1 43N 
0.843N 
0.833N 
0.979N 
0.980N 
0.988N 
0.986N 
0.988N 
0.345N 
0.1 87N 
0.088N 
0.032N 
0.028N 
0.024N 
0.018N 
0.020N 
0.028N 
0.022N 
0.01 5N 
0.016N 
0.021 N 
0.014N 
0.01 5N 
0.024N 
0.028N 
0.044N 
0.022N 
0.022N 
0.015N 
0.01 1N 
0.005N 
0.005N 
0.004N 
0.004N 
0.004N 
0.005N 
0.004N 



I3/I7 
0.925R 
0.940R 
0.81 9R 
0.51 9R 
0.425R 
0.409R 
0.242R 
0.223R 
0.204R 
0.1 02R 
0.1 02R 
0.115R 
0.1 13R 
0.081 R 
0.050R 
0.067R 
0.034R 
0.048R 
0.044R 
0.035R 
0.016R 
0.001R 
0.002R 
0.004R 
0.007R 
0.003R 
0.060R 
0.1 86R 
0.181R 
0.203R 
0.067R 
0.043R 
0.020R 
0.01 6R 
0.009R 
0.01 4R 
0.01 5R 
0.023R 
0.011R 
0.025R 
0.020R 
0.013R 
0.035R 
0.014R 
0.014R 
0.006R 
0.020R 
0.014R 
0.009R 
0.002R 
0.006R 
0.009R 
0.009R 
0.004R 
0.01 OR 



0.01 1W 
0.01 1W 
0.029W 
0.062W 
0.037W 
0.046W 
0.061W 
0.054W 
0.029W 
0.061 W 
0.056W 
0.051 W 
0.053W 
0.053W 
0.046W 
0.035W 
0.052W 
0.040W 
0.038W 
0.01 8W 
0.031W 
0.023W 
0.01 8W 
0.01 1W 
0.007W 
0.007W 
0.040W 
0.01 5W 
0.01 OW 
0.006W 
0.008W 
0.005W 
0.007W 
0.006W 
0.0D5W 
0.008W 
0.007W 
0.005W 
0.007W 
0.006W 
0.005W 
0.003W 
0.001W 
0.001W 
0.001W 
0.001W 
0.001W 
0.001W 
0.001 w 
0.002W 

o.ooow 

0.001W 
0.001W 
0.002W 

o.ooiw 



0.089XN 
0.126XN 
0.334XN 
0.680xN 
0.835xN 
0.872xN 
0.906xN 
0.91 5xN 
0.952xN 
0.952xN 
0.935xN 
0.882xN 
0.906xN 
0.91 7xN 
0.960xN 
0.920xN 
0.964XN 
0.957xN 
0.951xN 
0.241XN 
0.187XN 
0.009xN 
0.003xN 
0.006xN 
0.009xN 
0.008xN 
0.183xN 
0.189xN 
0.091 xN 
0.037xN 
0.056xN 
0.045xN 
0.038xN 
0.034XN 
0.025XN 
0.01 5xN 
0.01 OxN 
0.005xN 
0.003xN 
0.003xN 
O.OOIxN 
O.OOIxN 
O.OOIxN 
O.OOIxN 
O.OOIxN 
O.OOIxN 
0.002xN 
0.002xN 
0.003xN 
0.003xN 
0.004xN 
0.004xN 
0.003xN 
0.002xN 
0.003xN 



0.008Zr 
0.003Zr 
0.01 1Zr 
0.027Zr 
0.029Zr 
0.030Zr 
0.027Zr 
0.042Zr 
0.028Zr 
0.051Zr 
0.066Zr 
0.079Zr 
0.064Zr 
0.094Zr 
0.068Zr 
0.088Zr 
0.066Zr 
0.075Zr 
0.089Zr 
0.125Zr 
0.216Zr 
0.072Zr 
0.092Zr 
0.064Zr 
0.077Zr 
0.092Zr 
0.563Zr 
0.425Zr 
0.453Zr 
0.404Zr 
0.585Zr 
0.443Zr 
0.51 2Zr 
0.442Zr 
0.376Zr 
0.375Zr 
0.390Zr 
0.289Zr 
0.341Zr 
0.290Zr 
0.221Zr 
0.158Zr 
0.162Zr 
0.104Zr 
0.125Zr 
0.153Zr 
0.178Zr 
0.172Zr 
0.227Zr 
0.282Zr 
0.276Zr 
0.264Zr 
0.172Zr 
0,195Zr 
0.227Zr 



O.OOOZc 
O.OOIZc 
0.004ZC 
0.01 1Zc 
0.008ZC 
0.01 6Zc 
0.014ZC 
0.026ZC 
0.01 8Zc 
0.020ZC 
0.01 9Zc 
0.023ZC 
0.023Zc 
0.01 9Zc 
0.01 4Zc 
0.020Zc 
0.01 6Zc 
0.01 9Zc 
0.01 9Zc 
0.017ZC 
0.005ZC 
0.004ZC 

o.ooezc 

0.006ZC 
0.007ZC 
0.007ZC 
0.025ZC 
0.030ZC 
0.020ZC 
0.036ZC 
0.038ZC 
0.025Zc 
0.030ZC 
0.026ZC 
0.029ZC 
0.021ZC 
0.023ZC 
0.027ZC 
0.048ZC 
0.024Zc 
0.033ZC 
0.041ZC 
0.033ZC 
0.027ZC 
0.025ZC 
0.050ZC 
0.037ZC 
0.036ZC 
0.035ZC 
0.047Zc 
0.039ZC 
0.049ZC 
0.057ZC 
0.059ZC 
0.051 Zc 
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444AH 0.925AH 0.001IY 0.024OW 0.003N 0.014R 0.003W 0.002xN 0.186Zr : 0.066Zc 
448AH 0.959AH 0.000IY 0.01 10W 0.004N 0.012R 0.002W 0.002xN 0.106Zr 0.054ZC 
452AH 0.972AH 0.000IY 0.009OW 0.003N 0.004R 0.002W O.OOIxN 0.111Zr 0.068ZC 
456AH 0.967AH 0.000IY 0.01 20W 0.004N 0.007R 0.002W O.OOIxN 0.067Zr 0.050Zc 
460AH 0.915AH O.OOOIY 0.016OW 0.002N 0.012R 0.003W O.OOIxN 0.109Zr 0.055Zc 
464AH 0.899AH O.OOOIY 0.020OW 0.002N 0.015R 0.005W O.OOIxN 0.073Zr 0.054ZC 
468AH 0.851 AH 0.001IY 0.057OW 0.001N 0.009R 0.005W O.OOIxN 0.067Zr 0.055ZC 
472AH 0.877AH O.OOOIY 0.075OW 0.001 N 0.006R 0.006W O.OOIxN 0.049Zr 0.069Zc 
476AH 0.882AH 0.001 IY 0.060OW 0.001 N 0.006R 0.004W O.OOIxN 0.080Zr 0 050Zc 
480AH 0.797AH 0.001IY 0.113OW O.OOON 0.004R 0.009W O.OOIxN 0.122Zr 0.059Zc 
484AH 0.737AH 0.001IY 0.229OW O.OOON 0.002R 0.009W O.OOIxN 0.108Zr 0 038Zc 
488AH 0.813AH 0.001IY 0.153OW O.OOON 0.002R 0.016W O.OOIxN 0.115Zr 0.042Zc 
492AH 0.642AH 0.002IY 0.185OW 0.001N 0.004R 0.034W O.OOIxN 0.140Zr 0.029ZC 
496.. 0.458AH 0.001IY 0.200OW 0.001N 0.006R 0.129W O.OOIxN 0.082Zr 0.023Zc 
500.. 0.306AH 0.002IY 0.304OW 0.001 N 0.004R 0.200W O.OOIxN 0.088Zr 0.01 9Zc 
504.. 0.159AH 0.003IY 0.598OW 0.001N 0.002R 0.165W 0.002xN 0.064Zr O.OIOZc 
508OW0.134AH 0.002IY 0.618OW 0.001N 0.004R 0.1 18W 0.002xN 0.044Zr 0.016Zc 
512.. 0.128AH 0.003IY 0.570OW 0.001 N 0.003R 0.202W 0.002xN 0.023Zr 0.009ZC 
516OW 0.094AH 0.005IY 0.791 OW O.OOON 0.002R 0.068W 0.004xN 0.01 1Zr 0.006Zc 
520OW0.243AH 0.005IY 0.629OW 0.001N 0.002R 0.019W 0.006xN 0.011Zr 0.008ZC 
524OW0.133AH 0.009IY 0.771 OW 0.002N 0.003R 0.024W 0.011 xN 0.006Zr 0.011Zc 
5280W 0.167AH 0.009IY 0.743OW 0.002N 0.003R 0.015W 0.014xN 0.004Zr 0.007Zc 
532OW0.215AH 0.011IY 0.621OW 0.003N 0.003R 0.017W 0.017xN 0.004Zr 0.012Zc 
536.. 0.186AH 0.016IY 0.402OW 0.006N 0.004R 0.058W 0.022xN 0.012Zr 0.015Zc 
540.. 0.110AH 0.013IY 0.534OW 0.005N 0.005R 0.061W 0.026xN 0.005Zr 0.012Zc 
544OW0.139AH 0.01 3IY 0.621 OW 0.003N 0.005R 0.018W 0.029xN 0.004Zr 0.01 1Zc 
548.. 0.162AH 0.01 5IY 0.480OW 0.004N 0.003R 0.039W 0.025xN 0.006Zr 0.014ZC 
552OW0.127AH 0.013IY 0.637OW 0.004N 0.003R 0.019W 0.030xN 0.003Zr 0.015ZC 
556OW0.075AH 0.011IY 0.837OW 0.003N 0.002R 0.009W 0.036xN O.OOIZr 0.012ZC 
560OW 0.083AH 0.01 OIY 0.884OW 0.002N 0.002R 0.003W 0.040xN O.OOIZr 0.01 1Zc 
564OW0.135AH 0.009IY 0.908OW 0.001 N 0.002R 0.001 W 0.037xN O.OODZr 0 008Zc 
568OW0.149AH 0.008IY 0.932OW 0.001N 0.001R 0.001W 0.056xN O.OOOZr 0.005ZC 
572OW0.110AH 0.008IY 0.950OW 0.002N 0.001R 0.001W 0.060xN O.OOOZr 0.008ZC 
5760W 0.068AH 0.007IY 0.967OW 0.001N 0.002R 0.001W 0.077xN O.OOOZr 0.005Zc 
580OW 0.074AH 0.006IY 0.958OW 0.002N 0.002R 0.001W 0.080xN O.OOOZr 0.004ZC 
5840W 0.120AH 0.008IY 0.912OW 0.006N 0.002R 0.002W 0.091xN O.OOOZr 0.005Zc 
5880W 0.108AH 0.009IY 0.922OW 0.008N 0.002R 0.003W 0.081xN O.OOIZr 0.006Zc 
5920W 0.066AH 0.007IY 0.952OW 0.007N 0.002R 0.006W 0.060xN O.OOIZr 0.005Zc 
5960W 0.078AH 0.006IY 0.932OW 0.010N 0.003R 0.007W 0.058xN O.OOIZr 0.004Zc 
600OW 0.108AH 0.005IY 0.892OW 0.01 7N 0.002R 0.01 OW 0.066xN O.OOIZr 0.005Zc 
604OW0.119AH 0.005IY 0.903OW 0.015N 0.002R 0.011W 0.050xN O.OOIZr 0.006Zc 
608OW0.080AH 0.006IY 0.950OW 0.007N 0.001R 0.012W 0.053xN O.OOOZr 0.004ZC 
612OW0.113AH 0.006IY 0.925OW 0.005N 0.001R 0.012W 0.047xN O.OOIZr 0.002Zc 
6160W 0.158AH 0.005IY 0.866OW 0.007N 0.001 R 0.027W 0.043xN O.OOIZr 0.002Zc 
620OW 0.182AH 0.005IY 0.814OW 0.006N 0.001R 0.047W 0.032xN 0.002Zr 0.002ZC 
624OW0.116AH 0.005IY 0.891OW 0.002N O.OOOR 0.112W 0.018xN O.OOIZr O.OOIZc 
6280W 0.087AH 0.004IY 0.860OW 0.002N O.OOOR 0.1 54W 0.024xN O.OOIZr O.OOOZc 
632OW0.105AH 0.003IY 0.865OW 0.002N 0.001R 0.1 19W 0.028xN O.OOOZr O.OOIZc 
6360W 0.068AH 0.003IY 0.906OW 0.003N 0.001 R 0.086W 0.046xN O.OOOZr O.OOIZc 
640OW 0.065AH 0.002IY 0.928OW 0.002N O.OOOR 0.075W 0.040xN O.OOOZr O.OOOZc 
6440W 0.031 AH 0.001 IY 0.927OW 0.002N 0.001 R 0.085W 0.037xN O.OOOZr O.OOOZc 
648OW 0.038AH 0.001 IY 0.914OW 0.002N 0.001 R 0.073W 0.061 xN O.OOOZr O.OOOZc 
6520W 0.037AH 0.001IY 0.861OW 0.004N 0.003R 0.053W 0.102xN O.OOOZr O.OOIZc 
656OW 0.039AH 0.001IY 0.882OW 0.003N 0.002R 0.052W 0.101xN O.OOOZr O.OOIZc 
660OW 0.032AH 0.001 IY 0.863OW 0.004N 0.003R 0.055W 0.098xN O.OOOZr O.OOIZc 
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